Conclusion:
What is Spark?
A fast and general engine for large-scale data processing
- A Open-source cluster computing framework
- End-to-End Analytics platform
- Developed to overcome limitations of Hadoop/Map Reduce
- Runs from a single desktop or a huge cluster
- Iterative, interactive or stream processing
- Supports multiple languages –Scala, Python, R, Java
- Major companies like Amazon, eBay, Yahoo use Spark.
Advantages of Spark
- A fast-growing Open Source engine
- Many times faster than map-reduce
- Keeps data in memory
- Runs alongside other Hadoop components
- Support for many programming languages
- Scala, R, python, Java, piping
- Same functionality across multiple languages
- Multiple options and libraries –Graph, SQL, ML, Streaming
- Works with multiple management frameworks